LeetCode API Design Evaluation and Latency Budget

Learn to meet non-functional requirements and estimate the response time of the LeetCode service.

Introduction#

This lesson will discuss how to achieve the non-functional requirements to increase efficiency and improve the response time of LeetCode API. We’ll analyze the response time, especially for contest service, which will be a key factor for the efficiency of the service.

Non-functional requirements#

The following section discusses how different non-functional requirements are achieved.

Availability and scalability#

GraphQL architecture's implementation helps us reduce the number of calls by bundling multiple requests into one. This enables the service to be available and scalable to more users. Also, the use of the pub-sub service decouples different microservices and enables asynchronous communication between them. Moreover, pagination improves the availability and scalability of the overall service by reducing the burden on the network and backend.

Security#

The security of the service focuses on the users and backend services. The following steps are taken to ensure the security of the service:

  • Only authenticated and authorized users are allowed to access the services. The users with basic authentication (registered users) can perform different basic operations, such as listing problems, reading comments, and so on, in the LeetCode service. Moreover, authorized users (authorized with JWT tokens) can perform primary operations such as practicing problems, commenting, participating in contests, etc.

  • The concept of containers is implemented to avoid malicious attacks using code submissions. The containers execute codes without letting them speak with the systems. The containers are allowed to work with appropriate privileges so that they don’t affect the system at any point.

Latency #

The usage of containers not only allows us to scale well but also allows us to reduce the latency because of their lightweight nature. Caching configurations for containers can further reduce the latency in spinning up containers and executing code for frequently used environments. This is particularly helpful in contests where problems will be executed on containers with the same configurational requirements. We also use GraphQL, which is well known for fetching targeted data only, further reducing the latency.

Achieving Non-Functional Requirements

Non-Functional Requirements

Approaches



Availability and scalability

  • Circuit breaker to avoid requesting the failed service
  • Rate-limiting for API access and contestants
  • Use of GraphQL and pub-sub service
  • API monitoring for on-time alerts
  • Pagination

Security

  • Basic authentication
  • Authorization using JWT


Low latency

  • Containers instead of VMs for quick execution of codes
  • Caching for repetitive requests
  • GraphQL data format to reduce data size

Latency budget#

We will analyze the response time of registering a user and fetching a list of problems to/from users and problems service, respectively. 

Note: As discussed in the Back-of-the-Envelope Calculations for Latency chapter, the latency of the GET and POST requests are affected by two different parameters. In the case of GET, the average RTT remains the same regardless of the data size (due to the small request size), and the time to download the response varies by 0.4 ms per KB. Similarly, for requests, the RTT time changes with the data size by 1.15 ms per KB after the base RTT time, which was 260 ms.

Register user#

The size of the request and response to register a user is estimated below:

  • Request size: The user’s information is approximately 300 bytes. The total request size, including the header, is 1 KB.

  • Response size: The response to the request to register a user is approximately 1 KB because it only includes the status code of the request.

We’ll consider request size only, as it affects the response time.

Response time#

The processing time for the above request is simple: a request is processed to save data to or retrieve data from a database. The data is processed by a single server, taking the minimum processing time as depicted in the following illustration:

Processing time of LeetCode service
Processing time of LeetCode service

We'll now estimate the response time for the request to register a user as given below:

Response Time Calculator to Register a User

Enter size in KBs1KB
Minimum latencyf382.05ms
Maximum latencyf463.05ms
Minimum response timef386.05ms
Maximum response timef467.05ms

Assuming the request size is 1 KB:

Timelatency=Timebase+RTTpost+DownloadTime_{latency} = Time_{base} + RTT_{post} + Download

RTTpost=RTTbase+1.15×Size=260 ms+1.15 ms×1 KBsRTT_{post} = RTT_{base}+1.15\times Size = 260\ ms + 1.15\ ms\times 1\ KBs

Timelatency_min=Timebase_min+(RTTbase+1.15×size of request (KBs))+0.4Time_{latency\_min} = Time_{base\_min} + (RTT_{base} + 1.15 \times size\ of\ request\ (KBs)) + 0.4

=120.5+(260+1.15×1)+0.4=382.05 ms= 120.5 + (260 + 1.15 \times 1) + 0.4 = 382.05\ ms

Timelatency_max=Timebase_max+(RTTbase+1.15×size of request (KBs))+0.4Time_{latency\_max} = Time_{base\_max} + (RTT_{base} + 1.15\times size\ of\ request\ (KBs)) + 0.4

=201.5+260+1.15×1+0.4=463.05 ms= 201.5 + 260 + 1.15 \times 1 + 0.4 = 463.05\ ms

Similarly, the response time is calculated as:

TimeResponse_min=Timelatency_min+Timeprocessing_min=382.05 ms+4 ms=386.05 msTime_{Response\_min} = Time_{latency\_min}+ Time_{processing\_min}= 382.05\ ms + 4\ ms = 386.05\ ms

TimeResponse_max=Timelatency_max+Timeprocessing_max=463.05 ms+4 ms=467.05 msTime_{Response\_max} = Time_{latency\_max}+ Time_{processing\_max}= 463.05\ ms + 4\ ms = 467.05\ ms

List problems#

The estimated size of the request and response to fetch a list of problems is given below:

  • Request size: The request size is approximately 1 KB, including query parameters to fetch the problems and header.

  • Response size: We have a size of approximately 11 KB for a single question. Suppose we have 50 problems fetched in each request to be displayed on the first page. The response size of this request would be approximately 550 KB.

We’ll consider response size only because it affects the response time. Moreover, processing time can increase when processing problems from different servers. So, we will use 4 ms as the minimum processing time (one server processing the request) and 12 ms as the maximum processing time (3 servers synchronously processing the request).

Response Time Calculator to List Problems

Enter size in KBs550KB
Minimum latencyf410.5ms
Maximum latencyf491.5ms
Minimum response timef414.5ms
Maximum response timef503.5ms

Assuming the response size is 550 KBs, then the latency is calculated by:

Timelatency_min=Timebase_min+RTTget+0.4×size of response (KBs)=120.5+70+0.4×550=410.5 msTime_{latency\_min} = Time_{base\_min} + RTT_{get} + 0.4 \times size\ of\ response\ (KBs) = 120.5 + 70 + 0.4 \times 550 = 410.5\ ms

Timelatency_max=Timebase_max+RTTget+0.4×size of response (KBs)=201.5+70+0.4×550=491.5 msTime_{latency\_max} = Time_{base\_max} + RTT_{get} + 0.4 \times size\ of\ response\ (KBs) = 201.5 + 70 + 0.4 \times 550 = 491.5\ ms

Similarly, the response time is calculated using the following equation:

TimeResponse=Timelatency+TimeprocessingTime_{Response} = Time_{latency}+ Time_{processing}

Now, for minimum response time, we use minimum values of base time and processing time:

TimeResponse_min=Timelatency_min+Timeprocessing_min=410.5 ms+4 ms=414.5 msTime_{Response\_min} = Time_{latency\_min}+ Time_{processing\_min}= 410.5\ ms + 4\ ms = 414.5\ ms

Now, for maximum response time, we use maximum values of base time and processing time:

TimeResponse_max=Timelatency_max+Timeprocessing_max=491.5 ms+12 ms=503.5 msTime_{Response\_max} = Time_{latency\_max}+ Time_{processing\_max}= 491.5\ ms + 12\ ms = 503.5\ ms

The following illustration shows the minimum and maximum latency and processing time for simple CRUD requests:

Response time of users and problems service
Response time of users and problems service

Response time for the contest service#

The response time above is estimated for simple CRUD operations. The LeetCode contest service provides the functionality of contests where many contestants attempt a few problems and submit their solutions (N tries for each problem) to the service. The illustration below depicts how the backend system processes the entire contest using various pub-sub queues.

Workflow of a contest service
Workflow of a contest service

Users can execute their code to test its correctness or submit their code to the contest. Since the contest evaluation process is long and it may take even days to come up with results, we will only focus on the latency and response time estimation for the code execution, as shown below.

We have seen above the entire workflow of a contest. Let us now estimate the code execution time latency. Suppose we have 15k contestants, and each submission attempt is around 1 KB. Let’s consider all 15k contestants submitting solutions simultaneously as the worst scenario. In that case, the processing time of the service is estimated as follows:

Created with Fabric.js 3.6.6
Response time for code submission with minimum code execution time

1 of 2

Created with Fabric.js 3.6.6
Response time for code submission with maximum code execution time

2 of 2

While the code execution time (50 ms) remains the same on containers, the time it takes for the pub-sub queue to forward the code to containers can vary. Assuming each submission takes 0.125 ms to process in the pub-sub queue, the earliest a submission can reach the container is 0.125 ms, whereas the maximum time it can take a submission to reach the container is ≈ 1.875 s (0.125×14,9990.125 \times 14,999). The latency time from the client to the API gateway can vary depending on the networks.

Point to Ponder

Question

In the workflow above, why don’t we check the plagiarism of submitted code before performing its execution/compilation?

Hide Answer

Plagiarism services are usually time consuming, and may act as a head-of-the-line blocker. However, executing the code first allows us to verify the correctness of the code before testing its originality.

Also, plagiarism testing is performed through third-party tools/services, which are usually quite expensive to attain, so we want to limit the calls by checking for plagiarism in correct solutions only.

Summary#

This chapter introduced a service integrating different microservices to perform code evaluations, discussions, interviews, and coding contests. We discussed the changes required for our design decisions for a seamless end-user experience. In the API model lesson, we learned how to define different endpoints and message formats to communicate with those endpoints. This chapter also discussed the factors that allow us to achieve non-functional requirements. The chapter ended by calculating the latency budget, which provided proof of how quickly we can get a response from the APIs, even during a contest with thousands of participants.

API Model for LeetCode Service

Requirements of the Stripe API